Analysing time series data with locations described by longitude and latitude.

  • #### Goal

Understand where users spend most of their time. This can be overall main locations across all times (say over a week) or within a specific time frame (for example a day). Over a week, this would be home and office, and if it is within a day or so, we might have another cluster coming up on Tuesdays's as they might play tennis (or so). The goal is quite open on purpose, please feel free to come up with whatever you think makes the most sense!

  • #### Data

The data is from a research project and gives [timestamp, latitude, longitude]. Feel free to use one file or as many as makes sense for what you need/want to test. The data is from this research project: http://extrasensory.ucsd.edu/ Or from Google Drive here: GDrive data

  • #### Tools/Packages

Please use any python packages that you find useful. As mentioned, DBSCAN might be interesting DBSCAN, DBSCAN Tutorial. But again, there are pros and cons to it. Use whatever you find useful.

Approach

The task is basically unsupervised learning related with clustering of time series data, so basically algorithms working for time series data will work better as they will consider time. Let's try to visualize initially the data with a heat map.

Loading data

In [ ]:
from os.path import join as pjoin
# from google.colab import drive
# mounted_drive_folder = '/content/data'
# data_path = 'ExtraSensory.per_uuid_absolute_location'
# drive.mount(mounted_drive_folder)
In [2]:
# !ls /content/data/My\ Drive/ExtraSensory.per_uuid_absolute_location | head
!ls data | head

Investigate structure of one file

In [3]:
import pandas as pd

# data_path = 'ExtraSensory.per_uuid_absolute_location'
# one_file_path = pjoin(
#     mounted_drive_folder,
#     'My Drive',
#     data_path,
#     '00EABED2-271D-49D8-B599-1D4A09240601.absolute_locations.csv.gz'
# )
one_file_path = pjoin(
    'data',
    '00EABED2-271D-49D8-B599-1D4A09240601.absolute_locations.csv.gz'
)
data = pd.read_csv(one_file_path, nrows=100, compression='gzip')
data.head(10)
Out[3]:
timestamp latitude longitude
0 1444079161 32.882408 -117.234661
1 1444079221 32.882466 -117.234577
2 1444079281 32.882466 -117.234563
3 1444079341 32.882470 -117.234562
4 1444079431 32.882422 -117.234651
5 1444079492 32.882438 -117.234630
6 1444079552 32.882481 -117.234569
7 1444079612 32.882425 -117.234608
8 1444079672 32.882470 -117.234586
9 1444079732 32.882489 -117.234615

Join all files to one time series df

In [4]:
import glob

# data_files_path_pattern = pjoin(
#     mounted_drive_folder,
#     'My Drive',
#     data_path,
#     '*.csv.gz'
# )
data_files_path_pattern = pjoin(
    'data',
    '*.csv.gz'
)


locations_data = pd.concat(
    pd.read_csv(
        file_name,
        compression='gzip',
        error_bad_lines=False
    ) for file_name in glob.glob(data_files_path_pattern)
).sort_values(
    'timestamp',
    ascending=True
).astype(
    {
        'timestamp': 'int',
        'latitude': 'float',
        'longitude': 'float',
    }
)

locations_data['timestamp'] = pd.to_datetime(
    locations_data['timestamp'],
    unit='s'
)
In [5]:
locations_data_ts = locations_data.set_index('timestamp', drop=True)
locations_data_ts.head()
Out[5]:
latitude longitude
timestamp
2015-06-05 20:49:01 32.882536 -117.234579
2015-06-05 20:50:02 32.882489 -117.234689
2015-06-05 20:51:02 32.882494 -117.234705
2015-06-05 20:52:02 32.882494 -117.234685
2015-06-05 20:53:02 32.882492 -117.234674
In [6]:
!pip install gmaps
!pip install geojson
!pip install tslearn
!pip install shapely
!jupyter labextension install @jupyter-widgets/jupyterlab-manager
!jupyter labextension install @bokeh/jupyter_bokeh
!pip install google-api-python-client
!pip install googlemaps
Requirement already satisfied: gmaps in /home/jaworski/anaconda3/lib/python3.7/site-packages (0.9.0)
Requirement already satisfied: six in /home/jaworski/anaconda3/lib/python3.7/site-packages (from gmaps) (1.12.0)
Requirement already satisfied: traitlets>=4.3.0 in /home/jaworski/anaconda3/lib/python3.7/site-packages (from gmaps) (4.3.3)
Requirement already satisfied: ipywidgets>=7.0.0 in /home/jaworski/anaconda3/lib/python3.7/site-packages (from gmaps) (7.2.1)
Requirement already satisfied: ipython>=5.3.0 in /home/jaworski/anaconda3/lib/python3.7/site-packages (from gmaps) (7.9.0)
Requirement already satisfied: geojson>=2.0.0 in /home/jaworski/anaconda3/lib/python3.7/site-packages (from gmaps) (2.5.0)
Requirement already satisfied: ipython-genutils in /home/jaworski/anaconda3/lib/python3.7/site-packages (from traitlets>=4.3.0->gmaps) (0.2.0)
Requirement already satisfied: decorator in /home/jaworski/anaconda3/lib/python3.7/site-packages (from traitlets>=4.3.0->gmaps) (4.4.1)
Requirement already satisfied: nbformat>=4.2.0 in /home/jaworski/anaconda3/lib/python3.7/site-packages (from ipywidgets>=7.0.0->gmaps) (4.4.0)
Requirement already satisfied: ipykernel>=4.5.1 in /home/jaworski/anaconda3/lib/python3.7/site-packages (from ipywidgets>=7.0.0->gmaps) (5.1.3)
Requirement already satisfied: widgetsnbextension~=3.2.0 in /home/jaworski/anaconda3/lib/python3.7/site-packages (from ipywidgets>=7.0.0->gmaps) (3.2.1)
Requirement already satisfied: pickleshare in /home/jaworski/anaconda3/lib/python3.7/site-packages (from ipython>=5.3.0->gmaps) (0.7.5)
Requirement already satisfied: pygments in /home/jaworski/anaconda3/lib/python3.7/site-packages (from ipython>=5.3.0->gmaps) (2.4.2)
Requirement already satisfied: prompt-toolkit<2.1.0,>=2.0.0 in /home/jaworski/anaconda3/lib/python3.7/site-packages (from ipython>=5.3.0->gmaps) (2.0.10)
Requirement already satisfied: setuptools>=18.5 in /home/jaworski/anaconda3/lib/python3.7/site-packages (from ipython>=5.3.0->gmaps) (41.6.0.post20191029)
Requirement already satisfied: pexpect; sys_platform != "win32" in /home/jaworski/anaconda3/lib/python3.7/site-packages (from ipython>=5.3.0->gmaps) (4.7.0)
Requirement already satisfied: backcall in /home/jaworski/anaconda3/lib/python3.7/site-packages (from ipython>=5.3.0->gmaps) (0.1.0)
Requirement already satisfied: jedi>=0.10 in /home/jaworski/anaconda3/lib/python3.7/site-packages (from ipython>=5.3.0->gmaps) (0.15.1)
Requirement already satisfied: jsonschema!=2.5.0,>=2.4 in /home/jaworski/anaconda3/lib/python3.7/site-packages (from nbformat>=4.2.0->ipywidgets>=7.0.0->gmaps) (3.1.1)
Requirement already satisfied: jupyter_core in /home/jaworski/anaconda3/lib/python3.7/site-packages (from nbformat>=4.2.0->ipywidgets>=7.0.0->gmaps) (4.5.0)
Requirement already satisfied: jupyter-client in /home/jaworski/anaconda3/lib/python3.7/site-packages (from ipykernel>=4.5.1->ipywidgets>=7.0.0->gmaps) (5.3.3)
Requirement already satisfied: tornado>=4.2 in /home/jaworski/anaconda3/lib/python3.7/site-packages (from ipykernel>=4.5.1->ipywidgets>=7.0.0->gmaps) (6.0.3)
Requirement already satisfied: notebook>=4.4.1 in /home/jaworski/anaconda3/lib/python3.7/site-packages (from widgetsnbextension~=3.2.0->ipywidgets>=7.0.0->gmaps) (6.0.1)
Requirement already satisfied: wcwidth in /home/jaworski/anaconda3/lib/python3.7/site-packages (from prompt-toolkit<2.1.0,>=2.0.0->ipython>=5.3.0->gmaps) (0.1.7)
Requirement already satisfied: ptyprocess>=0.5 in /home/jaworski/anaconda3/lib/python3.7/site-packages (from pexpect; sys_platform != "win32"->ipython>=5.3.0->gmaps) (0.6.0)
Requirement already satisfied: parso>=0.5.0 in /home/jaworski/anaconda3/lib/python3.7/site-packages (from jedi>=0.10->ipython>=5.3.0->gmaps) (0.5.1)
Requirement already satisfied: pyrsistent>=0.14.0 in /home/jaworski/anaconda3/lib/python3.7/site-packages (from jsonschema!=2.5.0,>=2.4->nbformat>=4.2.0->ipywidgets>=7.0.0->gmaps) (0.15.5)
Requirement already satisfied: importlib-metadata in /home/jaworski/anaconda3/lib/python3.7/site-packages (from jsonschema!=2.5.0,>=2.4->nbformat>=4.2.0->ipywidgets>=7.0.0->gmaps) (0.23)
Requirement already satisfied: attrs>=17.4.0 in /home/jaworski/anaconda3/lib/python3.7/site-packages (from jsonschema!=2.5.0,>=2.4->nbformat>=4.2.0->ipywidgets>=7.0.0->gmaps) (19.3.0)
Requirement already satisfied: python-dateutil>=2.1 in /home/jaworski/anaconda3/lib/python3.7/site-packages (from jupyter-client->ipykernel>=4.5.1->ipywidgets>=7.0.0->gmaps) (2.8.0)
Requirement already satisfied: pyzmq>=13 in /home/jaworski/anaconda3/lib/python3.7/site-packages (from jupyter-client->ipykernel>=4.5.1->ipywidgets>=7.0.0->gmaps) (18.1.0)
Requirement already satisfied: prometheus-client in /home/jaworski/anaconda3/lib/python3.7/site-packages (from notebook>=4.4.1->widgetsnbextension~=3.2.0->ipywidgets>=7.0.0->gmaps) (0.7.1)
Requirement already satisfied: nbconvert in /home/jaworski/anaconda3/lib/python3.7/site-packages (from notebook>=4.4.1->widgetsnbextension~=3.2.0->ipywidgets>=7.0.0->gmaps) (5.6.1)
Requirement already satisfied: terminado>=0.8.1 in /home/jaworski/anaconda3/lib/python3.7/site-packages (from notebook>=4.4.1->widgetsnbextension~=3.2.0->ipywidgets>=7.0.0->gmaps) (0.8.2)
Requirement already satisfied: jinja2 in /home/jaworski/anaconda3/lib/python3.7/site-packages (from notebook>=4.4.1->widgetsnbextension~=3.2.0->ipywidgets>=7.0.0->gmaps) (2.10.3)
Requirement already satisfied: Send2Trash in /home/jaworski/anaconda3/lib/python3.7/site-packages (from notebook>=4.4.1->widgetsnbextension~=3.2.0->ipywidgets>=7.0.0->gmaps) (1.5.0)
Requirement already satisfied: zipp>=0.5 in /home/jaworski/anaconda3/lib/python3.7/site-packages (from importlib-metadata->jsonschema!=2.5.0,>=2.4->nbformat>=4.2.0->ipywidgets>=7.0.0->gmaps) (0.6.0)
Requirement already satisfied: pandocfilters>=1.4.1 in /home/jaworski/anaconda3/lib/python3.7/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.2.0->ipywidgets>=7.0.0->gmaps) (1.4.2)
Requirement already satisfied: defusedxml in /home/jaworski/anaconda3/lib/python3.7/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.2.0->ipywidgets>=7.0.0->gmaps) (0.6.0)
Requirement already satisfied: bleach in /home/jaworski/anaconda3/lib/python3.7/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.2.0->ipywidgets>=7.0.0->gmaps) (3.1.0)
Requirement already satisfied: testpath in /home/jaworski/anaconda3/lib/python3.7/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.2.0->ipywidgets>=7.0.0->gmaps) (0.4.2)
Requirement already satisfied: mistune<2,>=0.8.1 in /home/jaworski/anaconda3/lib/python3.7/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.2.0->ipywidgets>=7.0.0->gmaps) (0.8.4)
Requirement already satisfied: entrypoints>=0.2.2 in /home/jaworski/anaconda3/lib/python3.7/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.2.0->ipywidgets>=7.0.0->gmaps) (0.3)
Requirement already satisfied: MarkupSafe>=0.23 in /home/jaworski/anaconda3/lib/python3.7/site-packages (from jinja2->notebook>=4.4.1->widgetsnbextension~=3.2.0->ipywidgets>=7.0.0->gmaps) (1.1.1)
Requirement already satisfied: more-itertools in /home/jaworski/anaconda3/lib/python3.7/site-packages (from zipp>=0.5->importlib-metadata->jsonschema!=2.5.0,>=2.4->nbformat>=4.2.0->ipywidgets>=7.0.0->gmaps) (7.2.0)
Requirement already satisfied: webencodings in /home/jaworski/anaconda3/lib/python3.7/site-packages (from bleach->nbconvert->notebook>=4.4.1->widgetsnbextension~=3.2.0->ipywidgets>=7.0.0->gmaps) (0.5.1)
Requirement already satisfied: geojson in /home/jaworski/anaconda3/lib/python3.7/site-packages (2.5.0)
Requirement already satisfied: tslearn in /home/jaworski/anaconda3/lib/python3.7/site-packages (0.1.29)
Requirement already satisfied: numpy in /home/jaworski/anaconda3/lib/python3.7/site-packages (from tslearn) (1.16.4)
Requirement already satisfied: Cython in /home/jaworski/anaconda3/lib/python3.7/site-packages (from tslearn) (0.29.10)
Requirement already satisfied: scipy in /home/jaworski/anaconda3/lib/python3.7/site-packages (from tslearn) (1.2.1)
Requirement already satisfied: scikit-learn in /home/jaworski/anaconda3/lib/python3.7/site-packages (from tslearn) (0.22.1)
Requirement already satisfied: joblib>=0.11 in /home/jaworski/anaconda3/lib/python3.7/site-packages (from scikit-learn->tslearn) (0.13.2)
Requirement already satisfied: shapely in /home/jaworski/anaconda3/lib/python3.7/site-packages (1.7.0)
Building jupyterlab assets (build:prod:minimize)
Building jupyterlab assets (build:prod:minimize)
Requirement already satisfied: google-api-python-client in /home/jaworski/anaconda3/lib/python3.7/site-packages (1.8.0)
Requirement already satisfied: uritemplate<4dev,>=3.0.0 in /home/jaworski/anaconda3/lib/python3.7/site-packages (from google-api-python-client) (3.0.1)
Requirement already satisfied: httplib2<1dev,>=0.9.2 in /home/jaworski/anaconda3/lib/python3.7/site-packages (from google-api-python-client) (0.17.0)
Requirement already satisfied: six<2dev,>=1.6.1 in /home/jaworski/anaconda3/lib/python3.7/site-packages (from google-api-python-client) (1.12.0)
Requirement already satisfied: google-auth-httplib2>=0.0.3 in /home/jaworski/anaconda3/lib/python3.7/site-packages (from google-api-python-client) (0.0.3)
Requirement already satisfied: google-auth>=1.4.1 in /home/jaworski/anaconda3/lib/python3.7/site-packages (from google-api-python-client) (1.11.0)
Requirement already satisfied: google-api-core<2dev,>=1.13.0 in /home/jaworski/anaconda3/lib/python3.7/site-packages (from google-api-python-client) (1.16.0)
Requirement already satisfied: pyasn1-modules>=0.2.1 in /home/jaworski/anaconda3/lib/python3.7/site-packages (from google-auth>=1.4.1->google-api-python-client) (0.2.8)
Requirement already satisfied: rsa<4.1,>=3.1.4 in /home/jaworski/anaconda3/lib/python3.7/site-packages (from google-auth>=1.4.1->google-api-python-client) (4.0)
Requirement already satisfied: cachetools<5.0,>=2.0.0 in /home/jaworski/anaconda3/lib/python3.7/site-packages (from google-auth>=1.4.1->google-api-python-client) (4.0.0)
Requirement already satisfied: setuptools>=40.3.0 in /home/jaworski/anaconda3/lib/python3.7/site-packages (from google-auth>=1.4.1->google-api-python-client) (41.6.0.post20191029)
Requirement already satisfied: protobuf>=3.4.0 in /home/jaworski/anaconda3/lib/python3.7/site-packages (from google-api-core<2dev,>=1.13.0->google-api-python-client) (3.8.0)
Requirement already satisfied: requests<3.0.0dev,>=2.18.0 in /home/jaworski/anaconda3/lib/python3.7/site-packages (from google-api-core<2dev,>=1.13.0->google-api-python-client) (2.22.0)
Requirement already satisfied: pytz in /home/jaworski/anaconda3/lib/python3.7/site-packages (from google-api-core<2dev,>=1.13.0->google-api-python-client) (2019.1)
Requirement already satisfied: googleapis-common-protos<2.0dev,>=1.6.0 in /home/jaworski/anaconda3/lib/python3.7/site-packages (from google-api-core<2dev,>=1.13.0->google-api-python-client) (1.51.0)
Requirement already satisfied: pyasn1<0.5.0,>=0.4.6 in /home/jaworski/anaconda3/lib/python3.7/site-packages (from pyasn1-modules>=0.2.1->google-auth>=1.4.1->google-api-python-client) (0.4.8)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /home/jaworski/anaconda3/lib/python3.7/site-packages (from requests<3.0.0dev,>=2.18.0->google-api-core<2dev,>=1.13.0->google-api-python-client) (1.24.2)
Requirement already satisfied: idna<2.9,>=2.5 in /home/jaworski/anaconda3/lib/python3.7/site-packages (from requests<3.0.0dev,>=2.18.0->google-api-core<2dev,>=1.13.0->google-api-python-client) (2.8)
Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /home/jaworski/anaconda3/lib/python3.7/site-packages (from requests<3.0.0dev,>=2.18.0->google-api-core<2dev,>=1.13.0->google-api-python-client) (3.0.4)
Requirement already satisfied: certifi>=2017.4.17 in /home/jaworski/anaconda3/lib/python3.7/site-packages (from requests<3.0.0dev,>=2.18.0->google-api-core<2dev,>=1.13.0->google-api-python-client) (2019.11.28)
Requirement already satisfied: googlemaps in /home/jaworski/anaconda3/lib/python3.7/site-packages (4.2.0)
Requirement already satisfied: requests<3.0,>=2.20.0 in /home/jaworski/anaconda3/lib/python3.7/site-packages (from googlemaps) (2.22.0)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /home/jaworski/anaconda3/lib/python3.7/site-packages (from requests<3.0,>=2.20.0->googlemaps) (1.24.2)
Requirement already satisfied: certifi>=2017.4.17 in /home/jaworski/anaconda3/lib/python3.7/site-packages (from requests<3.0,>=2.20.0->googlemaps) (2019.11.28)
Requirement already satisfied: idna<2.9,>=2.5 in /home/jaworski/anaconda3/lib/python3.7/site-packages (from requests<3.0,>=2.20.0->googlemaps) (2.8)
Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /home/jaworski/anaconda3/lib/python3.7/site-packages (from requests<3.0,>=2.20.0->googlemaps) (3.0.4)
In [7]:
# Some import snippets taken from previous Cracow air pollution research 
import warnings
import cufflinks as cf
import plotly.offline as offline
cf.go_offline()
offline.init_notebook_mode()

from bokeh.io import show, output_notebook
from bokeh.models import ColumnDataSource, GMapOptions
from bokeh.plotting import gmap
from bokeh.models import HoverTool
from bokeh.io import output_file, show, push_notebook
from bokeh.models.widgets import CheckboxGroup
from bokeh.models.widgets import DateRangeSlider
from bokeh.models import CustomJS
from bokeh.layouts import layout
from datetime import datetime
from bokeh.palettes import Spectral11
from bokeh.plotting import figure

# Plotting whisker
from bokeh.models import ColumnDataSource, Whisker
from bokeh.plotting import figure, show
from bokeh.sampledata.autompg import autompg as df
import gmaps
import gmaps.datasets
import geojson
from geojson import Feature, \
  Point, \
  FeatureCollection, \
  LineString, \
  Polygon, \
  MultiPolygon

#imputation of missing values
from sklearn.cluster import KMeans, DBSCAN
from tslearn.clustering import silhouette_score, TimeSeriesKMeans
import ipywidgets as widgets
from datetime import datetime
from IPython.display import display
from ipywidgets.embed import embed_minimal_html

warnings.filterwarnings('ignore')
output_notebook()
/home/jaworski/anaconda3/lib/python3.7/site-packages/sklearn/utils/deprecation.py:144: FutureWarning:

The sklearn.cluster.k_means_ module is  deprecated in version 0.22 and will be removed in version 0.24. The corresponding classes / functions should instead be imported from sklearn.cluster. Anything that cannot be imported from sklearn.cluster is now part of the private API.

Loading BokehJS ...
In [8]:
GOOGLE_API_KEY = 'AIzaSyCvXOQ2bgpTlze4a6DOyC24jDqWylRu9Ck'

localize_latitude = locations_data_ts['latitude'].median()
localize_longitude = locations_data_ts['longitude'].median()
map_options = GMapOptions(
    lat=localize_latitude,
    lng=localize_longitude,
    map_type='roadmap',
    zoom=11
)
source = ColumnDataSource(
    data=locations_data_ts[::3]
)

p = gmap(
    GOOGLE_API_KEY,
    map_options,
    title='Tracking of user positions',
    plot_width=1200,
    plot_height=1200
)
circles = p.circle(
    x='longitude',
    y='latitude',
    size=8,
    fill_color='blue',
    fill_alpha=0.8,
    source=source
)

hover = HoverTool(tooltips=[('index', '$index'),
    ('(longitude,latitude)', '(@longitude, @latitude)'),
    ('sensor id', '@id')], renderers=[circles])


p.add_tools(hover)
output_file('resampled_coordinates.html')
show(p)

Check if there are any NaN values

In [9]:
len(locations_data_ts[locations_data_ts.isnull().any(axis=1)]), len(locations_data_ts)
Out[9]:
(71599, 377346)

Interpolate NaN values

In [10]:
import numpy as np

# There are more complex interpolation methods which can be useful here
locations_data_ts_interp = locations_data_ts.interpolate(method='linear')
len(locations_data_ts_interp[locations_data_ts_interp.isnull().any(axis=1)])
Out[10]:
0

Clusterizing visually by heatmap layer method for google maps

In [11]:
fig = gmaps.figure()
gmaps.configure(api_key=GOOGLE_API_KEY)
locations = list(
    zip(
        locations_data_ts_interp['latitude'],
        locations_data_ts_interp['longitude']
    )
)

weights = [1 for _ in range(len(locations))]
if min(weights) < 0:
    weights = list(map(lambda weight: weight - min(weights), weights))

fig = gmaps.figure(center=(localize_latitude, localize_longitude,), zoom_level=10)
layer_of_tracker_map = gmaps.heatmap_layer(locations, weights=weights, dissipating = True)
layer_of_tracker_map.point_radius = 5
layer_of_tracker_map.max_intensity = 100
fig.add_layer(layer_of_tracker_map)
fig
In [12]:
embed_minimal_html('pure_heatmap.html', views=[fig])

Here we can find some red zones where there user was the most of the time, though the API doesn't provide a way to find cluster centers. Let's find these clusters by DBSCAN algorithm (Other options as time series clustering algorithm TimeSeriesKMeans can be used)

(H)DBSCAN clustering

Let's estimate epsilon parameter as the maximum distance between points to consider them being in one cluster. Let's assume 2 km distance, then having kms_per_radian=6371.0088 we can compute a value for epsilon in terms of lat and lon. We can tweak min_samples from 1 to larger value to consider some points as a noise.

We need to use haversine metric and ball tree algorithm to calculate great circle distances between points. Epsilon and latitude and longitude coordinates should be converted to radians.

Due to this explanation: https://stackoverflow.com/questions/44131411/dbscan-handling-big-data-crashes-and-memory-error and https://github.com/scikit-learn/scikit-learn/issues/5275 we can easily go out of memory. Just not to go out of memory here I resample the original dataset taking e.g. every 20th.

In [13]:
from sklearn.cluster import DBSCAN

np.random.seed(42)

# Scaling can be done with time windows
kms_per_radian = 6371.0088
eps = .5 / kms_per_radian
# coordinates_stantardized = StandardScaler().fit_transform(locations_data_ts_interp)
resampling_rate = 10
coordinates = locations_data_ts_interp.iloc[::resampling_rate].as_matrix()
radian_coordinates = np.radians(coordinates)
dbscan = DBSCAN(eps=eps, min_samples=5).fit(radian_coordinates)
labels = dbscan.labels_
number_of_clusters = len(set(labels))
clusters = pd.Series([coordinates[labels == n] for n in range(number_of_clusters)])
print('Number of clusters: {}'.format(number_of_clusters))
Number of clusters: 182
In [14]:
from geopy.distance import great_circle
from shapely.geometry import MultiPoint

def find_centermost_point(cluster):
    if not len(cluster):
        return (None, None)
    centroid = (MultiPoint(cluster).centroid.x, MultiPoint(cluster).centroid.y)
    centermost_point = min(cluster, key=lambda point: great_circle(point, centroid).m)
    return tuple(centermost_point)
centermost_points = clusters.map(find_centermost_point)
In [15]:
cluster_center_values = [(lon, lat) for lon, lat in centermost_points.values if lon is not None and lat is not None]
In [16]:
fig = gmaps.figure()
gmaps.configure(api_key=GOOGLE_API_KEY)
locations = list(
    zip(
        locations_data_ts_interp['latitude'],
        locations_data_ts_interp['longitude']
    )
)

weights = [1 for _ in range(len(locations))]
if min(weights) < 0:
    weights = list(map(lambda weight: weight - min(weights), weights))

fig = gmaps.figure(center=(localize_latitude, localize_longitude,), zoom_level=10)
layer_of_tracker_map = gmaps.heatmap_layer(locations, weights=weights, dissipating = True)
layer_of_tracker_map.point_radius = 5
layer_of_tracker_map.max_intensity = 100
fig.add_layer(layer_of_tracker_map)

symbols = gmaps.symbol_layer(cluster_center_values, stroke_color=['blue']*len(cluster_center_values), fill_color=['blue']*len(cluster_center_values))
fig.add_layer(symbols)

fig
In [17]:
embed_minimal_html('initial_clustering_attempt.html', views=[fig])

Another way to undersample is to find those samples when some sort of speed of user was not large, then user stayed in the same place for a while. It can be done by searching for the local minimum points e.g. by the following way:

In [18]:
from scipy.signal import argrelextrema
local_min_order = 2
speed = np.diff(np.sqrt(radian_coordinates[:, 0] ** 2 + radian_coordinates[:, 1] ** 2))
min_speed_indices = argrelextrema(speed, np.less, order=local_min_order)[0]
eps = 0.1 / kms_per_radian

coordinates = locations_data_ts_interp.iloc[min_speed_indices].as_matrix()
radian_coordinates = np.radians(coordinates)
dbscan = DBSCAN(eps=eps, min_samples=5).fit(radian_coordinates)
labels = dbscan.labels_
number_of_clusters = len(set(labels))
clusters = pd.Series([coordinates[labels == n] for n in range(number_of_clusters)])
print('Number of clusters: {}'.format(number_of_clusters))
Number of clusters: 80
In [19]:
centermost_points = clusters.map(find_centermost_point)
cluster_center_values = [(lon, lat) for lon, lat in centermost_points.values if lon is not None and lat is not None]
In [20]:
# Compute support per every cluster center as a number of points in a cluster
# And take value from the spectrum to visualize point
support_values = [len(val) for val in clusters.values if len(val)]
spectral_val_fraction = 1. / len(Spectral11)
starting = 0
ranged_to_color = {}
for idx in range(len(Spectral11)):
    ranged_to_color[(starting, starting + spectral_val_fraction)] = Spectral11[idx]
    starting += spectral_val_fraction

max_val = max(support_values)
support_values_normalized = [float(i)/max_val for i in support_values]
mypalette = [[val for key, val in ranged_to_color.items() if key[0] < support < key[1]][0] for support in support_values_normalized]
In [21]:
# Find cluster center location names
import googlemaps as gmaps_api_cli
gmaps_cli = gmaps_api_cli.Client(key=GOOGLE_API_KEY)
cluster_center_geocodes = [gmaps_cli.reverse_geocode(val) for val in cluster_center_values]
In [22]:
cluster_center_names = [geocode[0]['formatted_address'] for geocode in cluster_center_geocodes]
In [23]:
NUMBER_OF_PLACED_TO_TAKE = 4
places_nearby_cluster_centers = [gmaps_cli.places_nearby(el, 50, type='specific business type') for el in cluster_center_values]
places_nearby_cluster_centers = ['; '.join([el['name'] for el in place['results']][:NUMBER_OF_PLACED_TO_TAKE]) for place in places_nearby_cluster_centers]
In [24]:
hover_description_text = [f'{gcode_name}; {place_name}; Support: {support_val}'
                          for gcode_name, place_name, support_val in zip(cluster_center_names, places_nearby_cluster_centers, support_values)]

Output top visited places by cluster support

In [25]:
places_df = pd.DataFrame({
    'place_names': places_nearby_cluster_centers,
    'cluster_center_names': cluster_center_names,
    'cluster_center_values': cluster_center_values,
    'support_values': support_values
}).sort_values('support_values', ascending=False)
pd.set_option('display.max_colwidth', -1)
places_df[places_df.support_values>100]
Out[25]:
place_names cluster_center_names cluster_center_values support_values
0 Myers Drive; UC San Diego Bookstore; Montero, Maria Isa; The Zone 9500 Gilman Dr, La Jolla, CA 92093, USA (32.879546999999995, -117.23675700000001) 2768
34 100-198 W Hawthorn St; Sage Payment Solutions; South Coast Landscape; RD Alchemy Natural Products 138 W Hawthorn St, San Diego, CA 92101, USA (32.727299, -117.164635) 760
26 Solana Beach; Solana Highlands Apartments 701 S Nardo Ave, Solana Beach, CA 92075, USA (32.984483000000004, -117.261376) 491
14 Sixth Lane; Sixth College Matthews Apartments; Triton Athletic Training; Warren Field 43 Sixth Ln, La Jolla, CA 92093, USA (32.879015, -117.231573) 412
55 6711-6727 Limonite Ct; Carlsbad 6709 Limonite Ct, Carlsbad, CA 92009, USA (33.109873, -117.26795600000001) 343
68 521-699 Nautilus St; Manor Garage Moling Group LLC; San Diego 646 Nautilus St, La Jolla, CA 92037, USA (32.832512, -117.2746) 266
67 2720 Ocean Front, Del Mar, CA 92014, USA (32.97116, -117.2712625) 181
7 7901-7961 Avenida Navidad; San Diego 7954 Avenida Navidad, San Diego, CA 92122, USA (32.865014, -117.210006) 176
13 San Diego; UCSD Shiley-Marcos Alzheimer's Disease Research Center; East Campus Office Building; UC San Diego Health - Neurology, La Jolla 9444 Medical Center Dr, La Jolla, CA 92037, USA (32.876670000000004, -117.22750800000001) 132
40 North Greensview Drive; Chula Vista 2300 Greenbriar Dr, Chula Vista, CA 91915, USA (32.645376982625486, -116.96328304054056) 114
In [26]:
pd.reset_option('display.max_colwidth')

Place names can be also processed to find the most common tokens and find the exact place names the user visited, different parameters of the model can be tuned, some metrics such as silhouette score, within cluster sum of squares and elbow rule can be used to tweak the result to the optimal number of clusters.

It's quite possible that the user buys books or works at: "UC San Diego Bookstore",

works or uses services at: 100-198 W Hawthorn St; Sage Payment Solutions;

rents apartments: Solana Highlands Apartments Sixth College Matthews Apartments

the user can be a medical colleague student, worker or client, visits medical center.

Visualize the result with hover text information for every cluster centroid and palette extracted from normalized support and vector of 11 colors

In [27]:
fig = gmaps.figure()
gmaps.configure(api_key=GOOGLE_API_KEY)
locations = list(
    zip(
        locations_data_ts_interp['latitude'],
        locations_data_ts_interp['longitude']
    )
)

weights = [1 for _ in range(len(locations))]
if min(weights) < 0:
    weights = list(map(lambda weight: weight - min(weights), weights))

fig = gmaps.figure(center=(localize_latitude, localize_longitude,), zoom_level=10)
layer_of_tracker_map = gmaps.heatmap_layer(locations, weights=weights, dissipating = True)
layer_of_tracker_map.point_radius = 5
layer_of_tracker_map.max_intensity = 100
fig.add_layer(layer_of_tracker_map)

symbols = gmaps.symbol_layer(
    cluster_center_values,
    stroke_color=mypalette,
    fill_color=mypalette,
    hover_text=hover_description_text,
    display_info_box=True,
    info_box_content=hover_description_text,
    scale=4
)
fig.add_layer(symbols)

fig
In [28]:
embed_minimal_html('clustering_result.html', views=[fig])

Filtering by time

In [29]:
# Requires bokeh server
# min_date = locations_data_ts.index.min()
# max_date = locations_data_ts.index.max()
# date_range_slider = DateRangeSlider(
#     title='Date Range: ',
#     start=min_date,
#     end=max_date,
#     value=(
#         min_date,
#         max_date
#     ),
#     step=1
# )

# def plot_sensor_time_series(locations_data_ts_filtered):
#     map_options = GMapOptions(
#         lat=localize_latitude,
#         lng=localize_longitude,
#         map_type='roadmap',
#         zoom=11
#     )
#     source = ColumnDataSource(
#         data=locations_data_ts_filtered
#     )

#     p = gmap(
#         GOOGLE_API_KEY,
#         map_options,
#         title='Tracking of user positions',
#         plot_width=1200,
#         plot_height=1200
#     )
#     circles = p.circle(
#         x='longitude',
#         y='latitude',
#         size=8,
#         fill_color='blue',
#         fill_alpha=0.8,
#         source=source
#     )

#     hover = HoverTool(tooltips=[('index', '$index'),
#         ('(longitude,latitude)', '(@longitude, @latitude)'),
#         ('sensor id', '@id')], renderers=[circles])


#     p.add_tools(hover)
#     show(p)


# def date_range_change_handler(attr, old, new):
#     start_date, end_date = new
#     df_to_plot = locations_data_ts.loc[start_date:end_date, float_sensor_data_columns]
#     plot_sensor_time_series(df_to_plot)
    
# date_range_slider.on_change('value', date_range_change_handler)


# l = layout(children=[[date_range_slider]], sizing_mode='scale_width')

# handle = show(l, notebook_handle=True)

# push_notebook(handle=handle)
In [30]:
# All conda dependencies
!pip list
Package                            Version            
---------------------------------- -------------------
absl-py                            0.7.1              
alabaster                          0.7.12             
anaconda-clean                     1.0                
anaconda-client                    1.7.2              
anaconda-navigator                 1.9.7              
anaconda-project                   0.8.2              
asn1crypto                         0.24.0             
astor                              0.7.1              
astroid                            2.2.5              
astropy                            3.2.1              
atomicwrites                       1.3.0              
attrs                              19.3.0             
audioread                          2.1.8              
Babel                              2.7.0              
backcall                           0.1.0              
backports.functools-lru-cache      1.5                
backports.os                       0.1.1              
backports.shutil-get-terminal-size 1.0.0              
backports.tempfile                 1.0                
backports.weakref                  1.0.post1          
beautifulsoup4                     4.7.1              
bitarray                           0.9.3              
bkcharts                           0.2                
bleach                             3.1.0              
bokeh                              1.2.0              
boto                               2.49.0             
boto3                              1.9.169            
botocore                           1.12.169           
Bottleneck                         1.2.1              
cachetools                         4.0.0              
certifi                            2019.11.28         
cffi                               1.12.3             
changefinder                       0.3                
chardet                            3.0.4              
Click                              7.0                
cloudpickle                        1.1.1              
clyent                             1.2.2              
colorama                           0.4.1              
colorlover                         0.3.0              
conda                              4.8.2              
conda-build                        3.17.8             
conda-package-handling             1.3.9              
conda-verify                       3.4.2              
contextlib2                        0.5.5              
convertdate                        2.1.3              
cryptography                       2.7                
cufflinks                          0.16               
cycler                             0.10.0             
Cython                             0.29.10            
cytoolz                            0.9.0.1            
dask                               1.2.2              
decorator                          4.4.1              
defusedxml                         0.6.0              
distributed                        1.28.1             
docutils                           0.14               
entrypoints                        0.3                
ephem                              3.7.7.0            
et-xmlfile                         1.0.1              
fancycompleter                     0.8                
fastcache                          1.1.0              
fastparquet                        0.3.1              
fbprophet                          0.5                
filelock                           3.0.12             
Flask                              1.0.3              
future                             0.17.1             
gast                               0.2.2              
geographiclib                      1.50               
geojson                            2.5.0              
geopy                              1.21.0             
gevent                             1.4.0              
glob2                              0.6                
gmaps                              0.9.0              
gmplot                             1.2.0              
gmpy2                              2.0.8              
google-api-core                    1.16.0             
google-api-python-client           1.8.0              
google-auth                        1.11.0             
google-auth-httplib2               0.0.3              
google-cloud-iot                   0.3.0              
googleapis-common-protos           1.51.0             
googlemaps                         4.2.0              
greenlet                           0.4.15             
grpc-google-iam-v1                 0.12.3             
grpcio                             1.16.1             
h2o                                3.24.0.5           
h5py                               2.9.0              
heapdict                           1.0.0              
hmmlearn                           0.2.2              
holidays                           0.9.11             
html5lib                           1.0.1              
httplib2                           0.17.0             
idna                               2.8                
imageio                            2.5.0              
imagesize                          1.1.0              
imbalanced-learn                   0.6.1              
imblearn                           0.0                
importlib-metadata                 0.23               
ipdb                               0.12               
ipykernel                          5.1.3              
ipython                            7.9.0              
ipython-genutils                   0.2.0              
ipywidgets                         7.2.1              
isort                              4.3.20             
itsdangerous                       1.1.0              
jdcal                              1.4.1              
jedi                               0.15.1             
jeepney                            0.4                
Jinja2                             2.10.3             
jmespath                           0.9.4              
joblib                             0.13.2             
json5                              0.8.5              
jsonschema                         3.1.1              
jupyter                            1.0.0              
jupyter-client                     5.3.3              
jupyter-console                    6.0.0              
jupyter-core                       4.5.0              
jupyterlab                         1.2.0              
jupyterlab-server                  1.0.6              
jwt                                0.6.1              
Keras                              2.2.4              
Keras-Applications                 1.0.8              
Keras-Preprocessing                1.1.0              
keyring                            18.0.0             
kiwisolver                         1.1.0              
lazy-object-proxy                  1.4.1              
libarchive-c                       2.8                
librosa                            0.7.0              
lief                               0.9.0              
llvmlite                           0.29.0             
locket                             0.2.0              
lunardate                          0.2.0              
lxml                               4.3.3              
Markdown                           3.1.1              
MarkupSafe                         1.1.1              
matplotlib                         3.1.0              
mccabe                             0.6.1              
minio                              4.0.18             
mistune                            0.8.4              
mkl-fft                            1.0.12             
mkl-random                         1.0.2              
mkl-service                        2.0.2              
mock                               3.0.5              
more-itertools                     7.2.0              
mpmath                             1.1.0              
msgpack                            0.6.1              
multipledispatch                   0.6.0              
navigator-updater                  0.2.1              
nbconvert                          5.6.1              
nbformat                           4.4.0              
networkx                           2.3                
nilm-metadata                      0.2.3              
nilmtk                             0.3.0.dev-412be54  
nltk                               3.4.1              
noisereduce                        1.0.1              
nose                               1.3.7              
notebook                           6.0.1              
numba                              0.44.1             
numexpr                            2.6.9              
numpy                              1.16.4             
numpydoc                           0.9.1              
olefile                            0.46               
openpyxl                           2.6.2              
packaging                          19.0               
paho-mqtt                          1.5.0              
pandas                             0.24.2             
pandocfilters                      1.4.2              
parso                              0.5.1              
partd                              0.3.10             
path.py                            12.0.1             
pathlib2                           2.3.3              
patsy                              0.5.1              
pdbpp                              0.10.2             
pep8                               1.7.1              
pexpect                            4.7.0              
pick                               0.6.4              
pickleshare                        0.7.5              
pika                               1.1.0              
Pillow                             6.0.0              
pip                                19.3.1             
pkginfo                            1.5.0.1            
plotly                             3.10.0             
pluggy                             0.12.0             
ply                                3.11               
prometheus-client                  0.7.1              
prompt-toolkit                     2.0.10             
protobuf                           3.8.0              
psutil                             5.6.2              
psycopg2-binary                    2.8.3              
ptyprocess                         0.6.0              
py                                 1.8.0              
py4j                               0.10.7             
pyasn1                             0.4.8              
pyasn1-modules                     0.2.8              
pycodestyle                        2.5.0              
pycosat                            0.6.3              
pycparser                          2.19               
pycrypto                           2.6.1              
pycurl                             7.43.0.5           
pyflakes                           2.1.1              
Pygments                           2.4.2              
PyJWT                              1.7.1              
pykalman                           0.9.5              
pylint                             2.3.1              
pyodbc                             4.0.26             
pyOpenSSL                          19.0.0             
pyparsing                          2.4.0              
pyrsistent                         0.15.5             
PySocks                            1.7.0              
pyspark                            2.4.3              
pystan                             2.19.0.0           
pytest                             4.6.2              
pytest-arraydiff                   0.3                
pytest-astropy                     0.5.0              
pytest-doctestplus                 0.3.0              
pytest-html                        1.22.0             
pytest-metadata                    1.8.0              
pytest-openfiles                   0.3.2              
pytest-remotedata                  0.3.1              
python-dateutil                    2.8.0              
python-snappy                      0.5.4              
pytz                               2019.1             
PyWavelets                         1.0.3              
PyYAML                             5.1                
pyzmq                              18.1.0             
QtAwesome                          0.5.7              
qtconsole                          4.5.1              
QtPy                               1.7.1              
redis                              3.2.1              
requests                           2.22.0             
resampy                            0.2.2              
retrying                           1.3.3              
rope                               0.14.0             
rsa                                4.0                
ruamel-yaml                        0.15.46            
ruptures                           1.0.1              
s3transfer                         0.2.1              
scikit-image                       0.15.0             
scikit-learn                       0.22.1             
scipy                              1.2.1              
seaborn                            0.9.0              
SecretStorage                      3.1.1              
Send2Trash                         1.5.0              
setuptools                         41.6.0.post20191029
setuptools-git                     1.2                
Shapely                            1.7.0              
simplegeneric                      0.8.1              
singledispatch                     3.4.0.3            
six                                1.12.0             
snowballstemmer                    1.2.1              
sortedcollections                  1.1.2              
sortedcontainers                   2.1.0              
SoundFile                          0.10.2             
soupsieve                          1.8                
Sphinx                             2.1.0              
sphinxcontrib-applehelp            1.0.1              
sphinxcontrib-devhelp              1.0.1              
sphinxcontrib-htmlhelp             1.0.2              
sphinxcontrib-jsmath               1.0.1              
sphinxcontrib-qthelp               1.0.2              
sphinxcontrib-serializinghtml      1.1.3              
sphinxcontrib-websupport           1.1.2              
spyder                             3.3.4              
spyder-kernels                     0.4.4              
SQLAlchemy                         1.3.4              
statsmodels                        0.10.1             
sympy                              1.4                
tables                             3.5.2              
tabulate                           0.8.3              
tblib                              1.4.0              
tensorboard                        1.13.1             
tensorflow                         1.13.1             
tensorflow-estimator               1.13.0             
termcolor                          1.1.0              
terminado                          0.8.2              
testpath                           0.4.2              
thrift                             0.11.0             
tkcalendar                         1.5.0              
toolz                              0.9.0              
tornado                            6.0.3              
tqdm                               4.32.1             
traitlets                          4.3.3              
tslearn                            0.1.29             
ujson                              1.35               
unicodecsv                         0.14.1             
uritemplate                        3.0.1              
urllib3                            1.24.2             
wcwidth                            0.1.7              
webencodings                       0.5.1              
Werkzeug                           0.15.4             
wheel                              0.33.6             
widgetsnbextension                 3.2.1              
wmctrl                             0.3                
wrapt                              1.11.1             
wurlitzer                          1.0.2              
xgboost                            0.90               
xlrd                               1.2.0              
XlsxWriter                         1.1.8              
xlwt                               1.3.0              
yolk3k                             0.9                
zict                               0.1.4              
zipp                               0.6.0              
In [ ]: